Skip to content

feat: resolve HF Hub cache layout in model directory scanning#466

Merged
jundot merged 6 commits intojundot:mainfrom
AlexWorland:feature/hf-cache-discovery
Apr 14, 2026
Merged

feat: resolve HF Hub cache layout in model directory scanning#466
jundot merged 6 commits intojundot:mainfrom
AlexWorland:feature/hf-cache-discovery

Conversation

@AlexWorland
Copy link
Copy Markdown
Contributor

Summary

Models downloaded via huggingface-cli or huggingface_hub use an indirect cache layout (models--Org--Name/refs/main → commit hash → snapshots/<hash>/) instead of a flat directory. When a user adds their HF cache directory (e.g., ~/.cache/huggingface/hub) to model directories in settings, the scanner didn't understand this indirection and found nothing.

This change teaches the existing two-level scanner to resolve HF Hub cache entries to their active snapshot. No directories are scanned automatically — users opt in by adding their HF cache path to model directories. The resolved snapshot flows through the same _is_model_dir()_register_model() pipeline as all other models.

Changes

File Change
omlx/model_discovery.py Add _resolve_hf_cache_entry() — resolves models--Org--Name/ to snapshots/<hash>/ via refs/main; integrate into discover_models() between Level 1 and Level 2 checks
omlx/admin/routes.py Add HF cache resolution to list_hf_models() dashboard endpoint; extract duplicated dedupe/size/append logic into _add_model() helper
tests/test_model_discovery.py 13 new tests — 6 unit tests for _resolve_hf_cache_entry() edge cases, 7 integration tests for discover_models() with HF cache layouts

How it works

discover_models(model_dir):
  for each subdir:
    1. Is it an adapter?           → skip
    2. Has config.json?            → Level 1: register directly
    3. Is it models--Org--Name/?   → NEW: resolve snapshot, register if valid model
    4. Otherwise                   → Level 2: scan as org folder

Existing flat and org-nested layouts are completely unaffected — the HF cache check only fires for directories matching the models--*--* naming pattern, and the continue ensures it doesn't fall through to the org scan.

Testing

tests/test_model_discovery.py    88 passed (75 existing + 13 new)

New test coverage:

  • _resolve_hf_cache_entry(): valid entry, regular dir, missing org separator, missing refs/main, missing snapshot, whitespace stripping
  • discover_models(): single/multiple HF cache models, model_path points to snapshot, missing config.json skipped, mixed flat+HF cache, mixed org+HF cache, no fallthrough to org scan

AlexWorland and others added 6 commits March 29, 2026 14:10
Resolve models stored in HuggingFace Hub's cache format
(models--Org--Name/snapshots/<hash>/) so they appear in both
the model discovery engine and the dashboard model list.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
13 new tests covering _resolve_hf_cache_entry() edge cases and
discover_models() integration with HF cache directory layouts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove redundant is_file() check before read_text() — the try/except
OSError already handles missing refs/main.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Match existing test file conventions — no other test class uses
inline section dividers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Deduplicate the bare HF cache directory setup (entry + refs + snapshot)
into a shared helper. _make_hf_cache_model now calls through to it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Owner

@jundot jundot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, clean work.

The _resolve_hf_cache_entry() function is well structured and handles the edge cases properly (missing refs, missing snapshots, whitespace in hash). The continue placement is correct so HF cache dirs don't accidentally fall through to the org folder scan.

One minor note: in routes.py, the _add_model closure is defined before models and seen_names are declared. Works fine because of python's late-binding closures, but moving it after those two lines would read more naturally. Not a blocker though.

Tests look solid, merging.

@jundot jundot merged commit 81a7d7a into jundot:main Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants